fix(runtime): use provider_opts.context_size for compaction#2814
Merged
Conversation
Local models not catalogued in models.dev (e.g. DMR with HuggingFace GGUFs) can now supply context_size via provider_opts to enable compaction. When models.dev lookup fails, the runtime falls back to this user-supplied limit, making compaction (proactive threshold and post-overflow recovery) functional for uncatalogued models. Fixes docker#2800
Self-review of the previous commit surfaced four issues:
* compactIfNeeded carried an unused *modelsdev.Model parameter; drop it
and let the call sites pass the resolved contextLimit only.
* EmitStartupInfo and compactWithReason did their own catalogue-only
lookup, so the sidebar's context-percent and the post-compaction
TokenUsageEvent stayed inconsistent with the freshly-fixed compaction
triggers in loop.go and session_compaction.go.
* The provider_opts.context_size fallback was second-class. The user
typed that number in their config, and DMR allocates exactly that
much; treat it as authoritative when set, with the catalogue as
fallback. This also makes the resolution monotonic across providers
rather than depending on whether the catalogue has the model.
* The dual implementation of priority order (catalogue-first in
runStreamLoop, provider-first elsewhere) was a footgun.
Extract resolveContextLimit on LocalRuntime as the single source of
truth. compactionContextLimit, runStreamLoop, EmitStartupInfo and
compactWithReason now route through it, so the sidebar, the proactive
trigger and the LLM compactor all plan against the same number.
rumpl
approved these changes
May 18, 2026
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2800.
When Docker Model Runner (DMR) is configured with a model that isn't catalogued in models.dev — typically a HuggingFace GGUF such as
huggingface.co/unsloth/qwen3.5-4b-gguf:Q4_K_M— automatic compaction silently became a no-op:compactionContextLimitreturned0, so the LLM strategy bailed.runStreamLoopnever fired.Failed to get model definitionevery time the assistant tried to respond, with no way out.The user already supplies a
context_sizeinprovider_optsfor DMR to size the inference context. The runtime now uses that same value as the authoritative context limit when set, falling back to the models.dev catalogue otherwise. This keeps planning aligned with what the engine actually enforces.Resolution order
provider_opts.context_size(when set and parseable as a positive integer)0— caller treats as "can't compact"A single
LocalRuntime.resolveContextLimithelper is the source of truth, used by:compactionContextLimit(LLM compaction strategy)runStreamLoopproactive 90% triggerEmitStartupInfosidebar context-percent on session restorecompactWithReasonpost-compactionTokenUsageEventSo the sidebar, the proactive trigger, and the LLM compactor all plan against the same number.
Tests
provider_opts.context_sizetakes precedence over the catalogue.context_sizeis unset.provider_opts.context_sizewhenmodelsStore.GetModelerrors (the exact reported scenario).0when neither source yields a usable limit.`task lint` — 0 issues. `task test` — full suite passes.
Closes #2800